*Testing methods to fill in missing values for missing quarters when at least 1 
*year isn't complete 

*the purpose of this .do file is to create a "fake" data file where I 
*generate observations for quarters where we are missing information. 
*I am doing this so that I can test out Multiple Interpolation methods 
*I would like to be able to predict the missing data for the properties that 
*are missing a quarter before we compile the data annually 
global NCREIF 	"P:\Efficiency Financing\NCREIF\Summer_2017" 
global OrgData 	"$NCREIF\Original_Data"
global GenData 	"$NCREIF\Generated_Data"
global Logs 	"$NCREIF\Log_Files"
global Datasets "P:\Efficiency Financing\NCREIF\Stata\Datasets"
global Excel 	"P:\Efficiency Financing\NCREIF\Stata\Excel"
global Prices 	"P:\Efficiency Financing\NCREIF\Stata\Energy Prices"
global JC = "\\rff-fs1\Projects\Efficiency Financing\NCREIF\Stata"
global Unemployment = "$JC\Unemployment"
global Fake 	"$NCREIF\FakeFiles" 

clear all 
set more off 

*------------------------------------------------------------------------------*
*			Step 1- Make the "fake data" file with all missing Obs 			   *
*------------------------------------------------------------------------------*
{
*Import the pre-cleaned data 
use "$GenData\NCREIF_wmissingquartersinfo.dta", clear 
bysort propnum: gen id_unique=_n==1
keep if id_unique==1
keep propnum 
tempfile UniquePropnum
save "`UniquePropnum'" 

*Generate every possible year-quarter-property pair
clear all 
set obs 16 
gen year=2000
replace year=year[_n-1]+1 if _n>1
gen quarter=1
expand 2, generate(duplicate)
replace quarter=2 if duplicate==1
capture drop duplicate 
expand 2, generate(duplicate) 
replace quarter=3 if quarter==1 & duplicate==1
replace quarter=4 if quarter==2 & duplicate==1
capture drop duplicate 

cross using "`UniquePropnum'"
tempfile EveryYQPCombo
save "`EveryYQPCombo'" 

use "$GenData\NCREIF_wmissingquartersinfo.dta", clear
merge 1:1 propnum year quarter using "`EveryYQPCombo'"	

/*

    Result                           # of obs.
    -----------------------------------------
    not matched                       259,892
        from master                         0  (_merge==1)
        from using                    259,892  (_merge==2)

    matched                            97,548  (_merge==3)
    -----------------------------------------

. 
*/

*Now I need to find which unmerged options are actually missing in the dataset*
*Ie wich year-quarter-property pairs are missing from the NCREIF dataset
sort propnum year quarter 
by propnum year: carryforward zip, gen(filled_zip)
replace filled_zip=filled_zip[_n+1] if year==year[_n+1] & propnum==propnum[_n+1] & filled_zip==.
replace filled_zip=filled_zip[_n+1] if year==year[_n+1] & propnum==propnum[_n+1] & filled_zip==.
replace filled_zip=filled_zip[_n+1] if year==year[_n+1] & propnum==propnum[_n+1] & filled_zip==.
gen fake_data=1 if _merge==2 
keep if filled_zip!=.


count if filled_zip!=. & fake_data==1


*This gives me 10336 observations thta I have added into the data set that were
*missing before,

*so now I am going to try some other methods to figure out which observatinos are still missing 
	gen zip1=filled_zip if quarter==1
	gen zip2=filled_zip if quarter==2
	gen zip3=filled_zip if quarter==3
	gen zip4=filled_zip if quarter==4

	*First I some summary statistics of what exactly we are missing 
	bys propnum year: egen zipc1=total(zip1), missing
	bys propnum year: egen zipc2=total(zip2), missing
	bys propnum year: egen zipc3=total(zip3), missing
	bys propnum year: egen zipc4=total(zip4), missing
	
	*Generate a value that tells us which quarter is missing for a particular year-property pair 
	gen miss_zip1=1 if missing(zipc1)
	gen miss_zip2=1 if missing(zipc2)
	gen miss_zip3=1 if missing(zipc3)
	gen miss_zip4=1 if missing(zipc4)
	
	count if miss_zip1==1
	count if miss_zip2==1
	count if miss_zip3==1
	count if miss_zip4==1
	
*It looks like I have filled in all of the missing observations

drop zip1 zip2 zip3 zip4 zipc1 zipc2 zipc3 zipc4 miss_zip*
*These next loops fill in the missing information for the variables that are the
*"fake" observations that I created if the independent var is constant over time 
by propnum year: carryforward prop cbsa, replace
*list of time-invariant Vars
by propnum year: carryforward prop cbsa cbsadiv cbsaname cbsaordiv lifecycle ///
propertytype propertysubtype address1 state city county propertysubtype address1 ///
state city county censusdivfips acqdate acquisitionyear acquisitiondate initialcost ///
initacqcost percentleased saleqtr salecode yearsold saleprice netsalepriceweath ///
grosssaleprice mgrname mgrid nooffloors noofbuildings noofunits nra sqft ///
lastrenovatedyear yrbuiltorlastren Covered_E propnum yrbuilt rating ind_yr last_year_cert ///
years_since_cert ind_cert_last5 ind_cert_last10 numqts non0 non0qts complete miss_qt1 ///
miss_qt2 miss_qt3 miss_qt4 everrated, replace

*These next loops fill in the missing information for the variables that are the
*"fake" observations that I created 
local variables "prop cbsa cbsadiv cbsaname cbsaordiv lifecycle propertytype propertysubtype address1 state city county propertysubtype address1"
foreach var of local variables{
	replace `var'=`var'[_n+1] if year==year[_n+1] & propnum==propnum[_n+1]
	replace `var'=`var'[_n+2] if year==year[_n+2] & propnum==propnum[_n+2]
	replace `var'=`var'[_n+3] if year==year[_n+3] & propnum==propnum[_n+3]
	}
	
local variables "censusdivfips acqdate acquisitionyear acquisitiondate initialcost "
foreach var of local variables{
	replace `var'=`var'[_n+1] if year==year[_n+1] & propnum==propnum[_n+1]
	replace `var'=`var'[_n+2] if year==year[_n+2] & propnum==propnum[_n+2]
	replace `var'=`var'[_n+3] if year==year[_n+3] & propnum==propnum[_n+3]
	}
	
local variables "percentleased saleqtr salecode yearsold saleprice netsalepriceweath"
foreach var of local variables{
	replace `var'=`var'[_n+1] if year==year[_n+1] & propnum==propnum[_n+1]
	replace `var'=`var'[_n+2] if year==year[_n+2] & propnum==propnum[_n+2]
	replace `var'=`var'[_n+3] if year==year[_n+3] & propnum==propnum[_n+3]
	}

local variables "grosssaleprice mgrname mgrid nooffloors noofbuildings noofunits nra sqft"
foreach var of local variables{
	replace `var'=`var'[_n+1] if year==year[_n+1] & propnum==propnum[_n+1]
	replace `var'=`var'[_n+2] if year==year[_n+2] & propnum==propnum[_n+2]
	replace `var'=`var'[_n+3] if year==year[_n+3] & propnum==propnum[_n+3]
	}	

local variables "lastrenovatedyear yrbuiltorlastren Covered_E propnum yrbuilt rating ind_yr last_year_cert"
foreach var of local variables{
	replace `var'=`var'[_n+1] if year==year[_n+1] & propnum==propnum[_n+1]
	replace `var'=`var'[_n+2] if year==year[_n+2] & propnum==propnum[_n+2]
	replace `var'=`var'[_n+3] if year==year[_n+3] & propnum==propnum[_n+3]
	}	

local variables "years_since_cert ind_cert_last5 ind_cert_last10 numqts non0 non0qts complete miss_qt1"
foreach var of local variables{
	replace `var'=`var'[_n+1] if year==year[_n+1] & propnum==propnum[_n+1]
	replace `var'=`var'[_n+2] if year==year[_n+2] & propnum==propnum[_n+2]
	replace `var'=`var'[_n+3] if year==year[_n+3] & propnum==propnum[_n+3]
	}
	
local variables "miss_qt2 miss_qt3 miss_qt4 everrated"
foreach var of local variables{
	replace `var'=`var'[_n+1] if year==year[_n+1] & propnum==propnum[_n+1]
	replace `var'=`var'[_n+2] if year==year[_n+2] & propnum==propnum[_n+2]
	replace `var'=`var'[_n+3] if year==year[_n+3] & propnum==propnum[_n+3]
	}	
capture drop _merge
*first create yyyyq & zipcode var for the fake data 
egen yrq = concat(year quarter)
replace yyyyq=yrq if fake_data==1
count if missing(yyyyq)
drop yrq
replace zip= filled_zip if fake_data==1
drop filled_zip 
drop HDD CDD missingtemp count elecprice Unemployment gasprice CPI Year ///
real_inc_total real_exp_total real_exp_util real_valuepersf real_valueperunit ///
real_utilpersf real_elecprice real_gasprice real_inc_brent logrealutilpersf ///
util_pct logrealutilpersf2 ever_merged rating ind_yr last_year_cert ///
years_since_cert ind_cert_last5 ind_cert_last10 dup_qtr firstobs numqts ///
non0 non0qts complete qt1 qt2 qt3 qt4 qtr1 qtr2 qtr3 qtr4 miss_qt1 miss_qt2 ///
miss_qt3 miss_qt4 everrated uniquebuilding buildingevercert buildingnevercert firstyearindata lastyearindata
save "$Fake\NCREIF_wFakeData06062017.dta", replace
}
*------------------------------------------------------------------------------*
*		Step 2- Merge the Fake File with the control Variables				   *
*------------------------------------------------------------------------------*
{
use "$Fake\NCREIF_wFakeData06062017.dta", replace

*Export Zip and CBSA Lists
keep zip
duplicates drop zip, force
save "$Fake\Fakeziplist2016", replace

use "$Fake\NCREIF_wFakeData06062017.dta", replace
keep cbsa
duplicates drop cbsa, force
save "$Fake\Fakecbsalist2016", replace

*-----------------------------------------*
*	   	Step 2.1 Unemployment 			  *
*-----------------------------------------*	

import excel "$Unemployment\CBSA Crosswalk.xlsx", firstrow clear cellrange(A3)
keep CBSACode CBSATitle
duplicates drop CBSACode, force
gen cbsa = real(CBSACode)
drop if cbsa == .
merge 1:1 cbsa using "$Fake\Fakecbsalist2016"

keep if _merge == 3
drop _merge
gen CBSA1 = substr(CBSATitle, 1, strpos(CBSATitle, "-")-1)
gen CBSA2 = substr(CBSATitle, strpos(CBSATitle, ","), length(CBSATitle))
replace CBSATitle = CBSA1 + CBSA2 if strpos(CBSATitle, "-") != 0
replace CBSATitle = CBSA1 if length(CBSATitle) - length(subinstr(CBSATitle, ",", "", .)) >= 2 & strpos(CBSATitle, "part") == 0
/*
replace CBSATitle = "Cincinnati, OH-KY-IN" if CBSATitle == "Cincinnati, OH"
replace CBSATitle = "Fort Madison, IA-MO" if CBSATitle == "Fort Madison, IA-IL-MO"
replace CBSATitle = "La Crosse, WI" if CBSATitle == "La Crosse, WI-MN"
replace CBSATitle = "Myrtle Beach, SC" if CBSATitle == "Myrtle Beach, SC-NC"
replace CBSATitle = subinstr(CBSATitle, "/Jefferson County", "", .)
replace CBSATitle = "Santa Barbara, CA" if strpos(CBSATitle, "Santa Maria") != 0
replace CBSATitle = "Louisville, KY-IN" if CBSATitle == "Louisville, KY"
replace CBSATitle = "Steubenville, OH-WV" if CBSATitle == "Weirton, WV-OH"
replace CBSATitle = "Honolulu, HI" if CBSATitle == "Urban Honolulu, HI"
replace CBSATitle = "Pendleton, ORetail" if CBSATitle == "Hermiston, ORetail"
replace CBSATitle = "Southern Pines, NC" if CBSATitle == "Pinehurst, NC"
*/
replace CBSATitle = "Norwich, CT-RI" if CBSATitle == "Norwich, CT"
save "$Fake\FakeCBSACrosswalk2016", replace

*Match CBSA Code to LAU Code via Title
import excel "$Unemployment\LAUCBSACrosswalk_updatein2016.xlsx", firstrow clear
drop if strpos(area_text, "Metropolitan Statistical Area") + strpos(area_text, "Micropolitan Statistical Area") + strpos(area_text, "NECTA")== 0
foreach x in Metropolitan Micropolitan {
replace area_text = subinstr(area_text, " `x' Statistical Area", "", .)
replace area_text = subinstr(area_text, " `x' NECTA", "", .)
}
rename area_text CBSATitle
gen CBSA1 = substr(CBSATitle, 1, strpos(CBSATitle, "-")-1)
gen CBSA2 = substr(CBSATitle, strpos(CBSATitle, ","), length(CBSATitle))
replace CBSATitle = CBSA1 + CBSA2 if strpos(CBSATitle, "-") != 0
replace CBSATitle = CBSA1 if length(CBSATitle) - length(subinstr(CBSATitle, ",", "", .)) >= 2 & strpos(CBSATitle, "part") == 0

replace area_code = substr(area_code,1,8)
merge 1:1 CBSATitle using "$Fake\FakeCBSACrosswalk2016"
keep if _merge == 3
drop _merge
keep area_code CBSATitle cbsa
save "$Fake\FakeFileformerge", replace

*Import the LAU unemployment figures through 2016
import excel "$Fake\LauFiguresDownloaded06062017.xlsx", sheet("Sheet1") firstrow clear
drop if value>100
drop footnote_codes
replace series_id = trim(series_id)
rename series_id area_code
split area_code, p(M)

*Drop seaonally adjusted rates
*drop if area_code1=="LAS" //Drop 656   obs 
*capture drop area_code1
*capture drop area_code2
*Trim the series id to the info we need for the merge 
replace area_code = substr(area_code, 4, 8)

rename period month
replace month = substr(month, 2, .)
replace month = substr(month, 2, .) if substr(month,1,1) == "0"
destring month, replace
rename value Unemployment

duplicates tag area_code year month, generate(dup)
*take the mean of the observations 
collapse Unemployment, by(area_code year month)
drop if year<2000
*Dropping annual
drop if month == 13
*generating quarter
gen quarter = 1 if month == 1 | month == 2 | month == 3
replace quarter = 2 if month == 4 | month == 5 | month == 6
replace quarter = 3 if month == 7 | month == 8 | month == 9
replace quarter = 4 if month == 10 | month == 11 | month == 12
tostring quarter, replace
tostring year, replace 
gen yyyyq = year + quarter
save "$Fake\2000_2015_LauFigures.dta", replace 

use "$Fake\2000_2015_LauFigures.dta", clear
merge m:1 area_code using "$Fake\FakeFileformerge"
keep if _merge == 3 | _merge==2
drop _merge

duplicates tag yyyyq cbsa, generate(dup)
drop dup
collapse (mean) Unemployment, by (yyyyq cbsa year quarter)
destring year, replace
destring quarter, replace
save "$Fake\FakeUnemployment2016.dta", replace

*Merge with NCREIF Fake_data
use "$Fake\NCREIF_wFakeData06062017.dta", replace
capture drop _merge
merge m:1 cbsa yyyyq year quarter using "$Fake\FakeUnemployment2016.dta"
drop if _merge==2
gen unemp_merged=1 if _merge==3
/*
    Result                           # of obs.
    -----------------------------------------
    not matched                         3,070
        from master                       168  (_merge==1)
        from using                      2,902  (_merge==2)

    matched                           107,716  (_merge==3)
    -----------------------------------------
*/

capture drop _merge
save "$Fake\FakeNCREIFUnemployment2016.dta", replace

count if fake_data==1 & unemp_merged==. // Only 4 "fake" observations do not have data 
* I may want to go back and merge this with the other unemployment data for the 168 observations 
*That don't have unemployment data 


*-----------------------------------------*
*	   	Step 2.2 Elec Price 			  *
*-----------------------------------------*	
use "$Datasets\ncreif_elec_final_2000to2015.dta", clear
generate quarter = inlist(month, 1, 2, 3)
replace quarter = 2 if inlist(month, 4, 5, 6)
replace quarter = 3 if inlist(month, 7, 8, 9)
replace quarter = 4 if inlist(month, 10, 11, 12)
rename price elecprice
collapse (mean) elecprice, by(zip year quarter)

sum year

sum quarter

gen Zip = real(zip)
drop zip
rename Zip zip

/*NCREIF + Electricity Price*/
merge m:1 zip using "$Fake\Fakeziplist2016"

/*
. merge m:1 zip using "$Fake\Fakeziplist2016"
(note: variable zip was float, now double to accommodate
       using data's values)

    Result                           # of obs.
    -----------------------------------------
    not matched                        71,616
        from master                    71,616  (_merge==1)
        from using                          0  (_merge==2)

    matched                            79,424  (_merge==3)
    -----------------------------------------

. 
end of do-file
*/
*Check what the results are with the full data set *81344 match in the full set 
keep if _merge == 3
drop _merge

/*NCREIF&Unemployment + Electricity Price*/
merge 1:m year quarter zip using "$Fake\FakeNCREIFUnemployment2016.dta"
/*
    Result                           # of obs.
    -----------------------------------------
    not matched                        37,556
        from master                    37,556  (_merge==1)
        from using                          0  (_merge==2)

    matched                           107,884  (_merge==3)
    -----------------------------------------
*/
keep if _merge==3
capture drop _merge
save "$Fake\FakeNCREIFElectricity2016.dta", replace
*-----------------------------------------*
*	  	 	Step 2.3 HDD & CDD 			  *
*-----------------------------------------*	

**********HEATING AND COOLING DEGREE DAYS*************************
use "$Datasets\US_temperature_byZipcode_2000to2015.dta", clear
keep year quarter temperature_K zcta5ce10
/*
from 2000 1st quarter to 2013 4th quarter*/

compress
save "$Datasets\Temperatures_2000to2015.dta", replace
//////////////////////////////////////////////


use "$Datasets\Temperatures_2000to2015.dta", clear
gen temperature = (9/5)*((temperature_K*.02)-273)+32
gen HDD = (65 - temperature)*8 if temperature < 65 & temperature_K != 0
replace HDD = 0 if HDD == .
gen CDD = (temperature - 65)*8 if temperature > 65 & temperature_K != 0
replace CDD = 0 if CDD == .
gen missingtemp = temperature_K == 0
gen count = 1
rename zcta5ce10 zip
collapse (sum) HDD CDD missingtemp count, by (year quarter zip)
foreach x in HDD CDD {
	replace `x' = `x' * (count/(count-missingtemp))
	}
drop if HDD == . | CDD == .
merge 1:m year zip quarter using "$Fake\FakeNCREIFElectricity2016.dta"
/*
    Result                           # of obs.
    -----------------------------------------
    not matched                     2,076,638
        from master                 2,076,638  (_merge==1)
        from using                          0  (_merge==2)

    matched                           107,884  (_merge==3)
    -----------------------------------------
*/
keep if _merge == 3
drop _merge
save "$Fake\FakeNCREIFWeather2016.dta", replace


*-----------------------------------------*
*Step 2.4 Merge in CPI Adjustment and Gas Price*
*-----------------------------------------*	

*Merge with NCREIF and generate adjusted variables
use "$Fake\FakeNCREIFWeather2016.dta", clear
merge m:1 year zip using "$Datasets\NCREIFGas_2000to2015.dta"
gen gas_merge=1 if _merge==3 //112 observations do not have gas info 
drop if _merge==2
capture drop _merge

*Merge with CPI to then generate adjusted variables
merge m:1 yyyyq using"$Datasets\CPI2016.dta"
/*
    Result                           # of obs.
    -----------------------------------------
    not matched                            21
        from master                         0  (_merge==1)
        from using                         21  (_merge==2)

    matched                           107,884  (_merge==3)
    -----------------------------------------

*/
keep if _merge==3
drop _merge
destring inc_total, replace
destring inc_brent, replace
destring initialcost, replace
destring initacqcost, replace

foreach x in inc_total exp_total exp_util valuepersf valueperunit utilpersf elecprice gasprice inc_brent initialcost initacqcost{
	gen real_`x' = `x' * (170.1 / CPI)
	}
gen logrealutilpersf = log(real_utilpersf)
save "$Fake\FakeNCREIFAdjusted2016.dta", replace
}
*------------------------------------------------------------------------------*
* 						Step 3- Merge with E-Star 							   *
*------------------------------------------------------------------------------*
{
use "$Fake\FakeNCREIFAdjusted2016.dta", clear
capture drop ever_merged
capture drop _merge

*making a temporary file to merge this data with the estar data 
	tempfile PreCleanFakeEStarMerge
	save `PreCleanFakeEStarMerge' 
	
	merge m:1 propnum year using "$Datasets\\NCREIFAdjusted_new_filledwithEstar.dta"
	gen ever_merged=1 if _merge==3
	drop _merge

	merge m:1 propnum year using "$Datasets\\NCREIFAdjusted_01292015_old_filledwithEstar.dta"
	replace ever_merged=1 if _merge==3
	drop if _merge==2
	drop _merge
	rename rating modify_rating1

	merge m:1 propnum year using "$Datasets\\NCREIFAdjusted_01292015_old_filledwithEstar2.dta"
	replace ever_merged=1 if _merge==3
	drop if _merge==2
	drop _merge

*Every single observation in this data set was merged with estar information 

/*
ever_merged |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |    107,884      100.00      100.00
------------+-----------------------------------
      Total |    107,884      100.00

*/

capture drop ever_merged
capture drop _merge

*Create the same variables as in step 3-
count if modify_rating!=. & modify_rating1!=. & rating !=.
replace rating = modify_rating if modify_rating !=. & rating==.
replace rating = modify_rating1 if modify_rating1 !=. & rating==.
drop modify_rating*

sort prop year 
capture drop ind_yr last_year_cert years_since_cert ind_cert_last5 ind_cert_last10 ///
	dup_qtr firstobs numqts non0 non0qts complete qt1 qt2 qt3 qt4 qtr1 qtr2 qtr3 ///
	qtr4 miss_qt1 miss_qt2 miss_qt3 miss_qt4 everrated uniquebuilding ///
	buildingevercert buildingnevercert firstyearindata lastyearindata
	
gen ind_yr = year if rating!=.
sum year

*Now, I generate a value for the years since a building was rated 
bysort prop: carryforward ind_yr, gen(last_year_cert) 

gen years_since_cert= year-last_year_cert

*Value of certification has been suggested to decrease over time so, let's gen an indicator for years since cert 
*First, 5 years or less since certification 
gen ind_cert_last5=1 if !missing(years_since_cert) & years_since_cert<=5

*10 years or less since certification 
gen ind_cert_last10=1 if !missing(years_since_cert) & years_since_cert<=10

save "$Fake\FakeNCREIFwithEstar.dta", replace
}
*------------------------------------------------------------------------------*
* 	Step 4- Fill in Missing Obs for Utility Expenditure and Rent			   *
*------------------------------------------------------------------------------*
{
use "$Fake\FakeNCREIFwithEstar.dta", clear 
*First, assess number of observations that do not have the full set of quarterly obs
*and generate a measure of which quarters are missing 

*The code that Paige Gans wrote in Ncreif Energy Star Merge.do no 
*longer works for this data set to determine the missing quarter probelm 
*because now, all of these quarters actually exist in the data, they just 
*do not have any rent or utility bill information  

*-----------------------------------------*
*	Step 4.1 Missing Quarter Problem 	  *
*-----------------------------------------*	
{
*See if there are any quarter-year duplicates 
	duplicates tag propnum year quarter, gen(dup_qtr)
	*no duplicates in the data 
	drop dup_qtr
	*If a value is less than or equal to 0, consider it missing 
	replace real_exp_util=. if real_exp_util<=0
*tag years with less than 4 quarters of data
	gen qt1=real_exp_util if quarter==1
	gen qt2=real_exp_util if quarter==2
	gen qt3=real_exp_util if quarter==3
	gen qt4=real_exp_util if quarter==4
	
	bysort propnum year: egen qtr1=total(qt1), missing
	by propnum year: egen qtr2=total(qt2), missing
	by propnum year: egen qtr3=total(qt3), missing
	by propnum year: egen qtr4=total(qt4), missing

*Say that a quarter is missing information if the real utility expenditure value 
*is 0 or missing or if the expenditure is less than 0, consider it missing
*Generate a value that tells us which quarter is missing for a particular year-property pair 
	gen miss_qt1=1 if missing(qtr1) | qtr1<=0
	gen miss_qt2=1 if missing(qtr2)	| qtr2<=0 
	gen miss_qt3=1 if missing(qtr3)	| qtr3<=0
	gen miss_qt4=1 if missing(qtr4)	| qtr4<=0
*Generate a measure of which observations have information for every quarter 
	gen year_complete=1 if miss_qt1==. & miss_qt2==. & miss_qt3==. & miss_qt4==.
*Generate a measure of which observations have every quarter in at least 1 year 
	by propnum: egen one_yr_complete=min(year_complete)
	
*Generate a measure of the 3 level down- if there is an observation where there 
*is not a complete year, then see if that quarter exists in some other year 
*in the data 
	duplicates tag propnum if one_yr_complete==., generate(propdup)
	tab propdup 
	drop propdup
	*The max of this is 55, there is one property that has 55 observations 
	*and no complete years 
	gen qt1_exist=. 
	gen qt2_exist=. 
	gen qt3_exist=. 
	gen qt4_exist=. 
	set more off
	forvalues i=1/55{
		*Quarter 1 exists in some year for that property
		replace qt1_exist=1 if miss_qt1[_n+`i']==. & propnum==propnum[_n+`i'] ///
				& one_yr_complete==. & miss_qt1==1
		replace qt1_exist=1 if miss_qt1[_n-`i']==. & propnum==propnum[_n-`i'] ///
				& one_yr_complete==. & miss_qt1==1
		*Quarter 2 exists in some year for that property 
		replace qt2_exist=1 if miss_qt2[_n+`i']==. & propnum==propnum[_n+`i'] ///
				& one_yr_complete==. & miss_qt2==1
		replace qt2_exist=1 if miss_qt2[_n-`i']==. & propnum==propnum[_n-`i'] ///
				& one_yr_complete==. & miss_qt2==1
		*Quarter 3 exists in some year for that property 
		replace qt3_exist=1 if miss_qt3[_n+`i']==. & propnum==propnum[_n+`i'] ///
				& one_yr_complete==. & miss_qt3==1
		replace qt3_exist=1 if miss_qt3[_n-`i']==. & propnum==propnum[_n-`i'] ///
				& one_yr_complete==. & miss_qt3==1	
		*Quarter 4 exists in some year for that property 
		replace qt4_exist=1 if miss_qt4[_n+`i']==. & propnum==propnum[_n+`i'] ///
				& one_yr_complete==. & miss_qt4==1
		replace qt4_exist=1 if miss_qt4[_n-`i']==. & propnum==propnum[_n-`i'] ///
				& one_yr_complete==. & miss_qt4==1					
				}
	count if one_yr_complete==. & miss_qt1==1 & qt1_exist==1 //1856
	count if one_yr_complete==. & miss_qt2==1 & qt2_exist==1 //1780
	count if one_yr_complete==. & miss_qt3==1 & qt3_exist==1 //1652
	count if one_yr_complete==. & miss_qt4==1 & qt4_exist==1 //2072

*Now, I want to see how bad the missing data problem is for the obs in the "fake File" 
*Count if nothing is missing 
	count if miss_qt1==. & miss_qt2==. & miss_qt3==. & miss_qt4==.
	*Count if missing only quarter 1 
	count if miss_qt1==1 & miss_qt2==. & miss_qt3==. & miss_qt4==.
	*Count if missing only quarter 2
	count if miss_qt1==. & miss_qt2==1 & miss_qt3==. & miss_qt4==.
	*Count if missing only quarter 3
	count if miss_qt1==. & miss_qt2==. & miss_qt3==1 & miss_qt4==.
	*Count if missing only quarter 4
	count if miss_qt1==. & miss_qt2==. & miss_qt3==. & miss_qt4==1
	*Count if  missing both q1&q2
	count if miss_qt1==1 & miss_qt2==1 & miss_qt3==. & miss_qt4==.
	*Count if  missing both q1&q3
	count if miss_qt1==1 & miss_qt2==. & miss_qt3==1 & miss_qt4==.
	*Count if  missing both q1&q4
	count if miss_qt1==1 & miss_qt2==. & miss_qt3==. & miss_qt4==1
	*count if missing both q2&q3
	count if miss_qt1==. & miss_qt2==1 & miss_qt3==1 & miss_qt4==.
	*count if missing both q2&q4
	count if miss_qt1==. & miss_qt2==1 & miss_qt3==. & miss_qt4==1
	*count if missing both q3&q4
	count if miss_qt1==. & miss_qt2==. & miss_qt3==1 & miss_qt4==1
	*Count if missing q1,q2&q3
	count if miss_qt1==1 & miss_qt2==1 & miss_qt3==1 & miss_qt4==.
	*Count if missing q1,q2&q4
	count if miss_qt1==1 & miss_qt2==1 & miss_qt3==. & miss_qt4==1
	*Count if missing q1,q3&q4
	count if miss_qt1==1 & miss_qt2==. & miss_qt3==1 & miss_qt4==1
	*Count if missing q2,q3,&q4
	count if miss_qt1==. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1
	*Count if missing q1, q2, q3&q4
	count if miss_qt1==1 & miss_qt2==1 & miss_qt3==1 & miss_qt4==1
	
	
*Now, checking out the number of estar buildings we have in the data 
*No of obs with estar rating 
	count if rating!=.
	*No of Obs. that have ever been rated
	bysort propnum: egen everrated=min(rating)
	count if everrated!=.
	*No of obs. that have been rated in the last 5 or 10 years 
	count if ind_cert_last5==1
	count if ind_cert_last10==1
	*No ob obs that have never been estar rated 
	count if everrated==.
	*No of unique propnum in the data set- ie the number of unique buildings in the data set 
	bysort propnum: gen uniquebuilding=1 if _n==1
	count if uniquebuilding==1
	*Find the number of builings that have ever/never been certified 
	gen buildingevercert=1 if uniquebuilding==1 & everrated!=.
	count if buildingevercert==1
	gen buildingnevercert=1 if uniquebuilding==1 & everrated==.
	count if buildingnevercert==1
}
*-----------------------------------------*
*Step 4.2 Gen missing Values of utility Obs*
*-----------------------------------------*	
{
*First, if all of the quarter observations in the year-propnum pair exist, sum them 
	bys propnum year: egen yr_exp_util=total(real_exp_util) if year_complete==1	
	count if yr_exp_util==.
	*34956 missing values 
	count if year_complete==1 & qtr1==0
	count if year_complete==1 & qtr2==0
	count if year_complete==1 & qtr3==0
	count if year_complete==1 & qtr4==0
	
*Fill in the missing data 
*Now, if not all of the quarterly observations exist take the ratio of the 
*real utility expenditures, then take the median observation 
*for all of the observations for each propnum over the yeras 
*********************************STAGE 1****************************************
{
count if real_exp_util==.
gen filled_exp_util=real_exp_util if real_exp_util!=0
count if filled_exp_util==. | filled_exp_util==0
count if real_exp_util==. | real_exp_util==0  

set more off	
forvalues i=1/4 {	
	forvalues j=1/4 {
	gen qt`i'_`j'=qtr`i'/qtr`j' 
	replace qt`i'_`j'=. if qtr`i'<=0 | qtr`j'<=0
	}
	}
	

	//use median ratio to avoid pull of outliers
	forvalues i=1/4 {
		forvalues j=1/4 {
		bysort propnum: egen med`i'_`j'=median(qt`i'_`j') 
		}
		}
		
*Generating an imputed value for the missing quarters
*Step 1- if the value is missing, but there is at least one year for which all 
*of the quarterly obs exist 
 
	*If 1 quarter is missing: 
	*Missing q1 
	replace 	filled_exp_util=(qtr2*med1_2+qtr3*med1_3+qtr4*med1_4)/3 ///
				if qtr2!=. & qtr3!=. & qtr4!=. & miss_qt1==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==1
	*Missing q2
	replace 	filled_exp_util=(qtr1*med2_1+qtr3*med2_3+qtr4*med2_4)/3 ///
				if qtr1!=. & qtr3!=. & qtr4!=. & miss_qt2==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==2
	*Missing q3 
	replace 	filled_exp_util=(qtr1*med3_1+qtr2*med3_2+qtr4*med3_4)/3 ///
				if qtr1!=. & qtr2!=. & qtr4!=. & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==3
				
	*Missing q4 
	replace 	filled_exp_util=(qtr1*med4_1+qtr2*med4_2+qtr3*med4_3)/3 ///
				if qtr1!=. & qtr2!=. & qtr3!=. & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util)
				
	count if filled_exp_util==. 
	
	*If 2 quarters are missing 
	*Missing q1&q2, fill q1
	replace 	filled_exp_util=(qtr3*med1_3+qtr4*med1_4)/2 ///
				if qtr3!=. & qtr4!=. & miss_qt1==1 & miss_qt2==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==1
	*Missing q1&q2, fill q2
	replace 	filled_exp_util=(qtr3*med2_3+qtr4*med2_4)/2 ///
				if qtr3!=. & qtr4!=. & miss_qt1==1 & miss_qt2==1 ///
				& one_yr_complete==1 & missing(filled_exp_util)	& quarter==2
	*missing q1&q3, fill q1
	replace		filled_exp_util=(qtr2*med1_2+qtr4*med1_4)/2 ///
				if qtr2!=. & qtr4!=. & miss_qt1==1 & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==1
	*missing q1&q3, fill q3
	replace		filled_exp_util=(qtr2*med3_2+qtr4*med3_4)/2 ///
				if qtr2!=. & qtr4!=. & miss_qt1==1 & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==3
	*missing q1&q4, fill q1
	replace 	filled_exp_util=(qtr2*med1_2+qtr3*med1_3)/2 ///
				if qtr2!=. & qtr3!=. & miss_qt1==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util)	& quarter==1
	*missing q1&q4, fill q4
	replace 	filled_exp_util=(qtr2*med4_2+qtr3*med4_3)/2 ///
				if qtr2!=. & qtr3!=. & miss_qt1==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util)	& quarter==4
	*missing q2&q3, fill q2
	replace 	filled_exp_util=(qtr1*med2_1+qtr4*med2_4)/2 ///
				if qtr1!=. & qtr4!=. & miss_qt2==1 & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_exp_util)	& quarter==2
	*missing q2&q3, fill q3
	replace 	filled_exp_util=(qtr1*med3_1+qtr4*med3_4)/2 ///
				if qtr1!=. & qtr4!=. & miss_qt2==1 & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_exp_util)	& quarter==3
	*missing q3&q4, fill q3
	replace 	filled_exp_util=(qtr1*med3_1+qtr2*med3_2)/2 ///
				if qtr1!=. & qtr2!=. & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==3	
	*missing q3&q4, fill q4
	replace 	filled_exp_util=(qtr1*med4_1+qtr2*med4_2)/2 ///
				if qtr1!=. & qtr2!=. & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util)	& quarter==4
	*missing q2&q4, fill q2
	replace		filled_exp_util=(qtr1*med2_1+qtr3*med2_3)/2 ///
				if qtr1!=. & qtr3!=. & miss_qt2==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==2
	*missing q2&q4, fill q4
	replace		filled_exp_util=(qtr1*med4_1+qtr3*med4_3)/2 ///
				if qtr1!=. & qtr3!=. & miss_qt2==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==4
				
	count if filled_exp_util==.
	
	*If 3 quarters are missing 
	*Have q1, fill q2
	replace 	filled_exp_util=qtr1*med2_1 ///
				if qtr1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==2
	*Have q1, fill q3
	replace 	filled_exp_util=qtr1*med3_1 ///
				if qtr1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==3
	*Have q1, fill q4
	replace 	filled_exp_util=qtr1*med4_1 ///
				if qtr1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==4
	*Have q2, fill q1
	replace 	filled_exp_util=qtr2*med1_2 ///
				if qtr2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==1
	*Have q2, fill q3
	replace 	filled_exp_util=qtr2*med3_2 ///
				if qtr2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==3
	*Have q2, fill q4
	replace 	filled_exp_util=qtr2*med4_2 ///
				if qtr2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==4
	*Have q3, fill q1
	replace 	filled_exp_util=qtr3*med1_3 ///
				if qtr3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==1
	*Have q3, fill q2
	replace 	filled_exp_util=qtr3*med2_3 ///
				if qtr3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==2
	*Have q3, fill q4
	replace 	filled_exp_util=qtr3*med4_3 ///
				if qtr3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==4
	*Have q4, fill q1
	replace 	filled_exp_util=qtr4*med1_4 ///
				if qtr4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==1
	*Have q4, fill q2
	replace 	filled_exp_util=qtr4*med2_4 ///
				if qtr4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==2
	*Have q4, fill q3
	replace 	filled_exp_util=qtr4*med3_4 ///
				if qtr4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_exp_util) & quarter==3
				
	count if filled_exp_util==.
	
	*14541 observations missing the value for utility expenditure 
	*This is due to the fact that not every single property has a full year of 
	*observations 
*Same analysis but it's possible that not one full year is complete 

 
	*If 1 quarter is missing: 
	*Missing q1 
	replace 	filled_exp_util=(qtr2*med1_2+qtr3*med1_3+qtr4*med1_4)/3 ///
				if qtr2!=. & qtr3!=. & qtr4!=. & miss_qt1==1 ///
				& missing(filled_exp_util) & quarter==1
	*Missing q2
	replace 	filled_exp_util=(qtr1*med2_1+qtr3*med2_3+qtr4*med2_4)/3 ///
				if qtr1!=. & qtr3!=. & qtr4!=. & miss_qt2==1 ///
				& missing(filled_exp_util) & quarter==2
	*Missing q3 
	replace 	filled_exp_util=(qtr1*med3_1+qtr2*med3_2+qtr4*med3_4)/3 ///
				if qtr1!=. & qtr2!=. & qtr4!=. & miss_qt3==1 ///
				& missing(filled_exp_util) & quarter==3
				
	*Missing q4 
	replace 	filled_exp_util=(qtr1*med4_1+qtr2*med4_2+qtr3*med4_3)/3 ///
				if qtr1!=. & qtr2!=. & qtr3!=. & miss_qt4==1 ///
				& missing(filled_exp_util)
				
	count if filled_exp_util==. 
	
	*If 2 quarters are missing 
	*Missing q1&q2, fill q1
	replace 	filled_exp_util=(qtr3*med1_3+qtr4*med1_4)/2 ///
				if qtr3!=. & qtr4!=. & miss_qt1==1 & miss_qt2==1 ///
				& missing(filled_exp_util) & quarter==1
	*Missing q1&q2, fill q2
	replace 	filled_exp_util=(qtr3*med2_3+qtr4*med2_4)/2 ///
				if qtr3!=. & qtr4!=. & miss_qt1==1 & miss_qt2==1 ///
				& missing(filled_exp_util)	& quarter==2
	*missing q1&q3, fill q1
	replace		filled_exp_util=(qtr2*med1_2+qtr4*med1_4)/2 ///
				if qtr2!=. & qtr4!=. & miss_qt1==1 & miss_qt3==1 ///
				& missing(filled_exp_util) & quarter==1
	*missing q1&q3, fill q3
	replace		filled_exp_util=(qtr2*med3_2+qtr4*med3_4)/2 ///
				if qtr2!=. & qtr4!=. & miss_qt1==1 & miss_qt3==1 ///
				& missing(filled_exp_util) & quarter==3
	*missing q1&q4, fill q1
	replace 	filled_exp_util=(qtr2*med1_2+qtr3*med1_3)/2 ///
				if qtr2!=. & qtr3!=. & miss_qt1==1 & miss_qt4==1 ///
				& missing(filled_exp_util)	& quarter==1
	*missing q1&q4, fill q4
	replace 	filled_exp_util=(qtr2*med4_2+qtr3*med4_3)/2 ///
				if qtr2!=. & qtr3!=. & miss_qt1==1 & miss_qt4==1 ///
				& missing(filled_exp_util)	& quarter==4
	*missing q2&q3, fill q2
	replace 	filled_exp_util=(qtr1*med2_1+qtr4*med2_4)/2 ///
				if qtr1!=. & qtr4!=. & miss_qt2==1 & miss_qt3==1 ///
				& missing(filled_exp_util)	& quarter==2
	*missing q2&q3, fill q3
	replace 	filled_exp_util=(qtr1*med3_1+qtr4*med3_4)/2 ///
				if qtr1!=. & qtr4!=. & miss_qt2==1 & miss_qt3==1 ///
				& missing(filled_exp_util)	& quarter==3
	*missing q3&q4, fill q3
	replace 	filled_exp_util=(qtr1*med3_1+qtr2*med3_2)/2 ///
				if qtr1!=. & qtr2!=. & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==3	
	*missing q3&q4, fill q4
	replace 	filled_exp_util=(qtr1*med4_1+qtr2*med4_2)/2 ///
				if qtr1!=. & qtr2!=. & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util)	& quarter==4
	*missing q2&q4, fill q2
	replace		filled_exp_util=(qtr1*med2_1+qtr3*med2_3)/2 ///
				if qtr1!=. & qtr3!=. & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==2
	*missing q2&q4, fill q4
	replace		filled_exp_util=(qtr1*med4_1+qtr3*med4_3)/2 ///
				if qtr1!=. & qtr3!=. & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==4
				
	count if filled_exp_util==.
	
	*If 3 quarters are missing 
	*Have q1, fill q2
	replace 	filled_exp_util=qtr1*med2_1 ///
				if qtr1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==2
	*Have q1, fill q3
	replace 	filled_exp_util=qtr1*med3_1 ///
				if qtr1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==3
	*Have q1, fill q4
	replace 	filled_exp_util=qtr1*med4_1 ///
				if qtr1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==4
	*Have q2, fill q1
	replace 	filled_exp_util=qtr2*med1_2 ///
				if qtr2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==1
	*Have q2, fill q3
	replace 	filled_exp_util=qtr2*med3_2 ///
				if qtr2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==3
	*Have q2, fill q4
	replace 	filled_exp_util=qtr2*med4_2 ///
				if qtr2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==4
	*Have q3, fill q1
	replace 	filled_exp_util=qtr3*med1_3 ///
				if qtr3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==1
	*Have q3, fill q2
	replace 	filled_exp_util=qtr3*med2_3 ///
				if qtr3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==2
	*Have q3, fill q4
	replace 	filled_exp_util=qtr3*med4_3 ///
				if qtr3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==4
	*Have q4, fill q1
	replace 	filled_exp_util=qtr4*med1_4 ///
				if qtr4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& missing(filled_exp_util) & quarter==1
	*Have q4, fill q2
	replace 	filled_exp_util=qtr4*med2_4 ///
				if qtr4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& missing(filled_exp_util) & quarter==2
	*Have q4, fill q3
	replace 	filled_exp_util=qtr4*med3_4 ///
				if qtr4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& missing(filled_exp_util) & quarter==3
				
	count if filled_exp_util==.
}
********************************STAGE2******************************************
{	
*In the next section, I am taking the median observed value for a building, 
*and then taking the ratio of that to fill in the information for the observation=property 
*pairs that did not have a full year of utility obs in the data. 

drop med*
forvalues i=1/4 {
		bys propnum: egen med`i'=median(qtr`i') if qtr`i'!=0
		}
forvalues i=1/4 {	
	forvalues j=1/4 {
	gen med`i'_`j'=med`i'/med`j' 
	replace med`i'_`j'=. if med`i'<=0 | med`j'<=0
	}
	}
	

*Now take the ratio of the median expenditures for every quarter combinations 
*I am doing this because these are the observations that do not have complete 
*information for a single year 
*Repeat the exact same steps as above-- ***If I try to do the same as above, 
*then we have problems becuase for all of the data, if the year is not complete
*then there are no cases where any of the above analysis holds, so instead, here 
*I switch to taking the median of the values for the property 

	*If 1 quarter is missing: 
	*Missing q1 
	replace 	filled_exp_util=(med2*med1_2+med3*med1_3+med4*med1_4)/3 ///
				if med2!=. & med3!=. & med4!=. & miss_qt1==1 ///
				& missing(filled_exp_util) & quarter==1
	*Missing q2
	replace 	filled_exp_util=(med1*med2_1+med3*med2_3+med4*med2_4)/3 ///
				if med1!=. & med3!=. & med4!=. & miss_qt2==1 ///
				& missing(filled_exp_util) & quarter==2
	*Missing q3 
	replace 	filled_exp_util=(med1*med3_1+med2*med3_2+med4*med3_4)/3 ///
				if med1!=. & med2!=. & med4!=. & miss_qt3==1 ///
				& missing(filled_exp_util) & quarter==3
				
	*Missing q4 
	replace 	filled_exp_util=(med1*med4_1+med2*med4_2+med3*med4_3)/3 ///
				if med1!=. & med2!=. & med3!=. & miss_qt4==1 ///
				& missing(filled_exp_util)
				
	count if filled_exp_util==. 
	
	*If 2 quarters are missing 
	*Missing q1&q2, fill q1
	replace 	filled_exp_util=(med3*med1_3+med4*med1_4)/2 ///
				if med3!=. & med4!=. & miss_qt1==1 & miss_qt2==1 ///
				& missing(filled_exp_util) & quarter==1
	*Missing q1&q2, fill q2
	replace 	filled_exp_util=(med3*med2_3+med4*med2_4)/2 ///
				if med3!=. & med4!=. & miss_qt1==1 & miss_qt2==1 ///
				& missing(filled_exp_util)	& quarter==2
	*missing q1&q3, fill q1
	replace		filled_exp_util=(med2*med1_2+med4*med1_4)/2 ///
				if med2!=. & med4!=. & miss_qt1==1 & miss_qt3==1 ///
				& missing(filled_exp_util) & quarter==1
	*missing q1&q3, fill q3
	replace		filled_exp_util=(med2*med3_2+med4*med3_4)/2 ///
				if med2!=. & med4!=. & miss_qt1==1 & miss_qt3==1 ///
				& missing(filled_exp_util) & quarter==3
	*missing q1&q4, fill q1
	replace 	filled_exp_util=(med2*med1_2+med3*med1_3)/2 ///
				if med2!=. & med3!=. & miss_qt1==1 & miss_qt4==1 ///
				& missing(filled_exp_util)	& quarter==1
	*missing q1&q4, fill q4
	replace 	filled_exp_util=(med2*med4_2+med3*med4_3)/2 ///
				if med2!=. & med3!=. & miss_qt1==1 & miss_qt4==1 ///
				& missing(filled_exp_util)	& quarter==4
	*missing q2&q3, fill q2
	replace 	filled_exp_util=(med1*med2_1+med4*med2_4)/2 ///
				if med1!=. & med4!=. & miss_qt2==1 & miss_qt3==1 ///
				& missing(filled_exp_util)	& quarter==2
	*missing q2&q3, fill q3
	replace 	filled_exp_util=(med1*med3_1+med4*med3_4)/2 ///
				if med1!=. & med4!=. & miss_qt2==1 & miss_qt3==1 ///
				& missing(filled_exp_util)	& quarter==3
	*missing q3&q4, fill q3
	replace 	filled_exp_util=(med1*med3_1+med2*med3_2)/2 ///
				if med1!=. & med2!=. & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==3	
	*missing q3&q4, fill q4
	replace 	filled_exp_util=(med1*med4_1+med2*med4_2)/2 ///
				if med1!=. & med2!=. & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util)	& quarter==4
	*missing q2&q4, fill q2
	replace		filled_exp_util=(med1*med2_1+med3*med2_3)/2 ///
				if med1!=. & med3!=. & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==2
	*missing q2&q4, fill q4
	replace		filled_exp_util=(med1*med4_1+med3*med4_3)/2 ///
				if med1!=. & med3!=. & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==4
				
	count if filled_exp_util==.
	
	*If 3 quarters are missing 
	*Have q1, fill q2
	replace 	filled_exp_util=med1*med2_1 ///
				if med1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==2
	*Have q1, fill q3
	replace 	filled_exp_util=med1*med3_1 ///
				if med1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==3
	*Have q1, fill q4
	replace 	filled_exp_util=med1*med4_1 ///
				if med1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==4
	*Have q2, fill q1
	replace 	filled_exp_util=med2*med1_2 ///
				if med2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==1
	*Have q2, fill q3
	replace 	filled_exp_util=med2*med3_2 ///
				if med2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==3
	*Have q2, fill q4
	replace 	filled_exp_util=med2*med4_2 ///
				if med2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==4
	*Have q3, fill q1
	replace 	filled_exp_util=med3*med1_3 ///
				if med3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==1
	*Have q3, fill q2
	replace 	filled_exp_util=med3*med2_3 ///
				if med3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==2
	*Have q3, fill q4
	replace 	filled_exp_util=med3*med4_3 ///
				if med3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_exp_util) & quarter==4
	*Have q4, fill q1
	replace 	filled_exp_util=med4*med1_4 ///
				if med4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& missing(filled_exp_util) & quarter==1
	*Have q4, fill q2
	replace 	filled_exp_util=med4*med2_4 ///
				if med4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& missing(filled_exp_util) & quarter==2
	*Have q4, fill q3
	replace 	filled_exp_util=med4*med3_4 ///
				if med4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& missing(filled_exp_util) & quarter==3
				
	count if filled_exp_util==.
}
	*Now that I have generated filled in values, generate the 
	*Yearly utility expenditure measures 
	bysort propnum year: egen not_missing_obs=max(filled_exp_util)
	drop yr_exp_util
	bys propnum year: egen yr_exp_util=total(filled_exp_util) if !missing(not_missing_obs)	
	
*make the var of interest- yearly real utility expenditure per square foot 
	bysort propnum year: egen yr_exp_util_max=max(yr_exp_util)
	bysort propnum year: egen yr_exp_util_mode=mode(yr_exp_util)
	br if yr_exp_util_mode!=yr_exp_util_max
	by propnum year: egen sqft_max=max(sqft)
	by propnum year: egen sqft_mode=mode(sqft), max
	br if sqft_mode!=sqft_max
	by propnum year: egen sqft_min=min(sqft)
	br if sqft_mode!=sqft_min
	br if sqft_min!=sqft_max
	*If the observations don't have the same square footage in a year, take the mode
	*Check with Margaret and Karen about this 
	replace sqft=sqft_mode 
	*This replaces the information for 0 observations 
	gen rl_yr_utilpersf=yr_exp_util/sqft
	*Generates 7488 missing obs 
save "$Fake\FakeNCREIFwfilledUtilpersf", replace
}
*-----------------------------------------*
*			Step 4.3 Summary Stats		  *
*-----------------------------------------*	
{	
*SUMMARY STATS 

*Summary Statistics: 
	*Check to make sure real util per sf is the same for all obs in a year 
	bysort propnum year: egen utilpersf_mode=mode(rl_yr_utilpersf), max
	bysort propnum year: egen utilpersf_max=max(rl_yr_utilpersf)
	count if utilpersf_max!=utilpersf_mode //0 Observations 
	*There are 380 observations that have a 0 for square footage- we
	*must count these as missing, so I am going to generate a new 
	*measure of having every obs of the var of interest real utility exp per 
	*sqft being complete for a year 
preserve
count if rl_yr_utilpersf!=.	
gen ind=1 
by propnum year: egen numqtrsfilled=total(ind) if rl_yr_utilpersf!=.
keep if numqtrsfilled==4
count if rl_yr_utilpersf!=.


collapse (min) rating last_year_cert years_since_cert ind_cert_last5 ///
	ind_cert_last10 rl_yr_utilpersf, by(propnum year)

	*Energy Star Certification Summary Stats: 

	*Now, checking out the number of estar buildings we have in the data 
	*No of obs with estar rating that year
	count if rating!=.
	*No of Obs. that have ever been rated
	bysort propnum: egen everrated=min(rating)
	count if everrated!=.
	*No of obs. that have been rated in the last 5 or 10 years 
	count if ind_cert_last5==1
	count if ind_cert_last10==1
	*No ob obs that have never been estar rated 
	count if everrated==.
	*No of unique propnum in the data set- ie the number of unique buildings in the data set 
	bysort propnum: gen uniquebuilding=1 if _n==1
	count if uniquebuilding==1
	*Find the number of builings that have ever/never been certified 
	gen buildingevercert=1 if uniquebuilding==1 & everrated!=.
	count if buildingevercert==1
	gen buildingnevercert=1 if uniquebuilding==1 & everrated==.
	count if buildingnevercert==1
drop if rl_yr_utilpersf>12.50
*Real Utility per square foot Information 
	su rl_yr_utilpersf, detail
	su rl_yr_utilpersf if rating!=., detail
	su rl_yr_utilpersf if rating==., detail
	su rl_yr_utilpersf if ind_cert_last5==1, detail
	su rl_yr_utilpersf if ind_cert_last5==., detail
	su rl_yr_utilpersf if ind_cert_last10==1, detail
	su rl_yr_utilpersf if ind_cert_last10==., detail
	*Now I want builiding average information- ie average util per sf
	*if the building has ever been certified versus never been certified 
	*if a building has ever been rated
	su rl_yr_utilpersf if everrated!=.
	su rl_yr_utilpersf if everrated==.
	*Generate a unique building id
	by propnum: egen building_avg=mean(rl_yr_utilpersf)
	drop if uniquebuilding!=1
	su rl_yr_utilpersf if everrated!=.
	su rl_yr_utilpersf if everrated==.
	
	by propnum: egen firstyearrated=min(last_year_cert)
	*trying to find the mean of the property before and after 
	*certification 
	count if firstyearrated!=.
	su rl_yr_utilpersf if !missing(firstyearrated) & year<firstyearrated, detail
	su rl_yr_utilpersf if !missing(firstyearrated) & year>=firstyearrated, detail

	
*Graphs-- 
	*kdensities of before and after cert on the same graph 
	twoway 	(kdensity rl_yr_utilpersf if !missing(firstyearrated) & year<firstyearrated, biweight width(5) lwidth(medthick) lcolor(navy)) ///
			(kdensity rl_yr_utilpersf if !missing(firstyearrated) & year>=firstyearrated, biweight width(5) lwidth(medthick) lcolor(cyan) ///
			bgcolor(white) graphregion(color(white)) xtitle(Real Utility Expenditure Per Sqft.) ytitle(Kernel Density) ///
			legend(lab(1 "Before Certification") lab(2 "After Certification")) title("2nd Stage Fill"))
	graph export "P:\Efficiency Financing\NCREIF\Summer_2017\Notes&OtherInfo\Summary Graphs\SecondStageFill.png", replace
restore 
	
capture drop qt1 qt2 qt3 qt4 qtr1 qtr2 qtr3 qtr4 miss_qt1 miss_qt2 miss_qt3 miss_qt4 ///
			qt1_1 qt1_2 qt1_3 qt1_4 qt2_1 qt2_2 qt2_3 qt2_4 qt3_1 qt3_2 qt3_3 ///
			qt3_4 qt4_1 qt4_2 qt4_3 qt4_4 med* not_missing_obs sqft_max sqft_mode ///
			year_complete one_yr_complete sqft_min	
save "$Fake\FakeNCREIFwfilledUtilpersf", replace
}
*-----------------------------------------*
*	Step 4.5 Gen missing Values of Rent   *
*-----------------------------------------*	

use "$Fake\FakeNCREIFwfilledUtilpersf", replace
	capture drop qt1 qt2 qt3 qt4 qtr1 qtr2 qtr3 qtr4 miss_qt1 miss_qt2 miss_qt3 miss_qt4 ///
			qt1_1 qt1_2 qt1_3 qt1_4 qt2_1 qt2_2 qt2_3 qt2_4 qt3_1 qt3_2 qt3_3 ///
			qt3_4 qt4_1 qt4_2 qt4_3 qt4_4 med* not_missing_obs sqft_max sqft_mode ///
			year_complete one_yr_complete sqft_min	
	destring inc_brent, replace 
	capture drop real_inc_brent
	gen real_inc_brent = inc_brent * (170.1 / CPI)
	
*Classify the missing observations (considering 0s, and negatives as missing) 
	replace real_inc_brent=. if real_inc_brent<=0
	count if real_inc_brent!=. //89527
*tag years with less than 4 quarters of data
	gen qt1=real_inc_brent if quarter==1
	gen qt2=real_inc_brent if quarter==2
	gen qt3=real_inc_brent if quarter==3
	gen qt4=real_inc_brent if quarter==4
	
	bysort propnum year: egen qtr1=total(qt1), missing
	by propnum year: egen qtr2=total(qt2), missing
	by propnum year: egen qtr3=total(qt3), missing
	by propnum year: egen qtr4=total(qt4), missing
*Mark each quarter that is missing rental information 
	gen miss_qt1=1 if missing(qtr1) | qtr1<=0
	gen miss_qt2=1 if missing(qtr2)	| qtr2<=0 
	gen miss_qt3=1 if missing(qtr3)	| qtr3<=0
	gen miss_qt4=1 if missing(qtr4)	| qtr4<=0
*Generate a measure of which observations have information for every quarter 
	capture drop year_complete one_yr_complete
	gen year_complete=1 if miss_qt1==. & miss_qt2==. & miss_qt3==. & miss_qt4==.
*Generate a measure of which observations have every quarter in at least 1 year 
	by propnum: egen one_yr_complete=min(year_complete)
	
*Begin to fill in the data: 
	count if real_inc_brent==. //18357
	gen filled_inc_brent=real_inc_brent if real_inc_brent!=0
	count if filled_inc_brent==.

********************************STAGE1******************************************
{
*Generate the ratios that will be used to project missing quarter information 
set more off	
forvalues i=1/4 {	
	forvalues j=1/4 {
	gen qt`i'_`j'=qtr`i'/qtr`j' 
	replace qt`i'_`j'=. if qtr`i'<=0 | qtr`j'<=0
	}
	}
	

	//use median ratio to avoid pull of outliers
	forvalues i=1/4 {
		forvalues j=1/4 {
		bysort propnum: egen med`i'_`j'=median(qt`i'_`j') 
		}
		}	
	
*If 1 quarter is missing: 
	*Missing q1 
	replace 	filled_inc_brent=(qtr2*med1_2+qtr3*med1_3+qtr4*med1_4)/3 ///
				if qtr2!=. & qtr3!=. & qtr4!=. & miss_qt1==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==1
	*Missing q2
	replace 	filled_inc_brent=(qtr1*med2_1+qtr3*med2_3+qtr4*med2_4)/3 ///
				if qtr1!=. & qtr3!=. & qtr4!=. & miss_qt2==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==2
	*Missing q3 
	replace 	filled_inc_brent=(qtr1*med3_1+qtr2*med3_2+qtr4*med3_4)/3 ///
				if qtr1!=. & qtr2!=. & qtr4!=. & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==3
				
	*Missing q4 
	replace 	filled_inc_brent=(qtr1*med4_1+qtr2*med4_2+qtr3*med4_3)/3 ///
				if qtr1!=. & qtr2!=. & qtr3!=. & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent)
				
	count if filled_inc_brent==. 
	
	*If 2 quarters are missing 
	*Missing q1&q2, fill q1
	replace 	filled_inc_brent=(qtr3*med1_3+qtr4*med1_4)/2 ///
				if qtr3!=. & qtr4!=. & miss_qt1==1 & miss_qt2==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==1
	*Missing q1&q2, fill q2
	replace 	filled_inc_brent=(qtr3*med2_3+qtr4*med2_4)/2 ///
				if qtr3!=. & qtr4!=. & miss_qt1==1 & miss_qt2==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent)	& quarter==2
	*missing q1&q3, fill q1
	replace		filled_inc_brent=(qtr2*med1_2+qtr4*med1_4)/2 ///
				if qtr2!=. & qtr4!=. & miss_qt1==1 & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==1
	*missing q1&q3, fill q3
	replace		filled_inc_brent=(qtr2*med3_2+qtr4*med3_4)/2 ///
				if qtr2!=. & qtr4!=. & miss_qt1==1 & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==3
	*missing q1&q4, fill q1
	replace 	filled_inc_brent=(qtr2*med1_2+qtr3*med1_3)/2 ///
				if qtr2!=. & qtr3!=. & miss_qt1==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent)	& quarter==1
	*missing q1&q4, fill q4
	replace 	filled_inc_brent=(qtr2*med4_2+qtr3*med4_3)/2 ///
				if qtr2!=. & qtr3!=. & miss_qt1==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent)	& quarter==4
	*missing q2&q3, fill q2
	replace 	filled_inc_brent=(qtr1*med2_1+qtr4*med2_4)/2 ///
				if qtr1!=. & qtr4!=. & miss_qt2==1 & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent)	& quarter==2
	*missing q2&q3, fill q3
	replace 	filled_inc_brent=(qtr1*med3_1+qtr4*med3_4)/2 ///
				if qtr1!=. & qtr4!=. & miss_qt2==1 & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent)	& quarter==3
	*missing q3&q4, fill q3
	replace 	filled_inc_brent=(qtr1*med3_1+qtr2*med3_2)/2 ///
				if qtr1!=. & qtr2!=. & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==3	
	*missing q3&q4, fill q4
	replace 	filled_inc_brent=(qtr1*med4_1+qtr2*med4_2)/2 ///
				if qtr1!=. & qtr2!=. & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent)	& quarter==4
	*missing q2&q4, fill q2
	replace		filled_inc_brent=(qtr1*med2_1+qtr3*med2_3)/2 ///
				if qtr1!=. & qtr3!=. & miss_qt2==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==2
	*missing q2&q4, fill q4
	replace		filled_inc_brent=(qtr1*med4_1+qtr3*med4_3)/2 ///
				if qtr1!=. & qtr3!=. & miss_qt2==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==4
				
	count if filled_inc_brent==.
	
	*If 3 quarters are missing 
	*Have q1, fill q2
	replace 	filled_inc_brent=qtr1*med2_1 ///
				if qtr1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==2
	*Have q1, fill q3
	replace 	filled_inc_brent=qtr1*med3_1 ///
				if qtr1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==3
	*Have q1, fill q4
	replace 	filled_inc_brent=qtr1*med4_1 ///
				if qtr1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==4
	*Have q2, fill q1
	replace 	filled_inc_brent=qtr2*med1_2 ///
				if qtr2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==1
	*Have q2, fill q3
	replace 	filled_inc_brent=qtr2*med3_2 ///
				if qtr2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==3
	*Have q2, fill q4
	replace 	filled_inc_brent=qtr2*med4_2 ///
				if qtr2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==4
	*Have q3, fill q1
	replace 	filled_inc_brent=qtr3*med1_3 ///
				if qtr3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==1
	*Have q3, fill q2
	replace 	filled_inc_brent=qtr3*med2_3 ///
				if qtr3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==2
	*Have q3, fill q4
	replace 	filled_inc_brent=qtr3*med4_3 ///
				if qtr3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==4
	*Have q4, fill q1
	replace 	filled_inc_brent=qtr4*med1_4 ///
				if qtr4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==1
	*Have q4, fill q2
	replace 	filled_inc_brent=qtr4*med2_4 ///
				if qtr4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==2
	*Have q4, fill q3
	replace 	filled_inc_brent=qtr4*med3_4 ///
				if qtr4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& one_yr_complete==1 & missing(filled_inc_brent) & quarter==3
				
	count if filled_inc_brent==.
	
	*9927 observations missing the value for utility expenditure 
	
	*Same analysis but it's possible that not one full year is complete 

 
	*If 1 quarter is missing: 
	*Missing q1 
	replace 	filled_inc_brent=(qtr2*med1_2+qtr3*med1_3+qtr4*med1_4)/3 ///
				if qtr2!=. & qtr3!=. & qtr4!=. & miss_qt1==1 ///
				& missing(filled_inc_brent) & quarter==1
	*Missing q2
	replace 	filled_inc_brent=(qtr1*med2_1+qtr3*med2_3+qtr4*med2_4)/3 ///
				if qtr1!=. & qtr3!=. & qtr4!=. & miss_qt2==1 ///
				& missing(filled_inc_brent) & quarter==2
	*Missing q3 
	replace 	filled_inc_brent=(qtr1*med3_1+qtr2*med3_2+qtr4*med3_4)/3 ///
				if qtr1!=. & qtr2!=. & qtr4!=. & miss_qt3==1 ///
				& missing(filled_inc_brent) & quarter==3
				
	*Missing q4 
	replace 	filled_inc_brent=(qtr1*med4_1+qtr2*med4_2+qtr3*med4_3)/3 ///
				if qtr1!=. & qtr2!=. & qtr3!=. & miss_qt4==1 ///
				& missing(filled_inc_brent)
				
	count if filled_inc_brent==. 
	
	*If 2 quarters are missing 
	*Missing q1&q2, fill q1
	replace 	filled_inc_brent=(qtr3*med1_3+qtr4*med1_4)/2 ///
				if qtr3!=. & qtr4!=. & miss_qt1==1 & miss_qt2==1 ///
				& missing(filled_inc_brent) & quarter==1
	*Missing q1&q2, fill q2
	replace 	filled_inc_brent=(qtr3*med2_3+qtr4*med2_4)/2 ///
				if qtr3!=. & qtr4!=. & miss_qt1==1 & miss_qt2==1 ///
				& missing(filled_inc_brent)	& quarter==2
	*missing q1&q3, fill q1
	replace		filled_inc_brent=(qtr2*med1_2+qtr4*med1_4)/2 ///
				if qtr2!=. & qtr4!=. & miss_qt1==1 & miss_qt3==1 ///
				& missing(filled_inc_brent) & quarter==1
	*missing q1&q3, fill q3
	replace		filled_inc_brent=(qtr2*med3_2+qtr4*med3_4)/2 ///
				if qtr2!=. & qtr4!=. & miss_qt1==1 & miss_qt3==1 ///
				& missing(filled_inc_brent) & quarter==3
	*missing q1&q4, fill q1
	replace 	filled_inc_brent=(qtr2*med1_2+qtr3*med1_3)/2 ///
				if qtr2!=. & qtr3!=. & miss_qt1==1 & miss_qt4==1 ///
				& missing(filled_inc_brent)	& quarter==1
	*missing q1&q4, fill q4
	replace 	filled_inc_brent=(qtr2*med4_2+qtr3*med4_3)/2 ///
				if qtr2!=. & qtr3!=. & miss_qt1==1 & miss_qt4==1 ///
				& missing(filled_inc_brent)	& quarter==4
	*missing q2&q3, fill q2
	replace 	filled_inc_brent=(qtr1*med2_1+qtr4*med2_4)/2 ///
				if qtr1!=. & qtr4!=. & miss_qt2==1 & miss_qt3==1 ///
				& missing(filled_inc_brent)	& quarter==2
	*missing q2&q3, fill q3
	replace 	filled_inc_brent=(qtr1*med3_1+qtr4*med3_4)/2 ///
				if qtr1!=. & qtr4!=. & miss_qt2==1 & miss_qt3==1 ///
				& missing(filled_inc_brent)	& quarter==3
	*missing q3&q4, fill q3
	replace 	filled_inc_brent=(qtr1*med3_1+qtr2*med3_2)/2 ///
				if qtr1!=. & qtr2!=. & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==3	
	*missing q3&q4, fill q4
	replace 	filled_inc_brent=(qtr1*med4_1+qtr2*med4_2)/2 ///
				if qtr1!=. & qtr2!=. & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent)	& quarter==4
	*missing q2&q4, fill q2
	replace		filled_inc_brent=(qtr1*med2_1+qtr3*med2_3)/2 ///
				if qtr1!=. & qtr3!=. & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==2
	*missing q2&q4, fill q4
	replace		filled_inc_brent=(qtr1*med4_1+qtr3*med4_3)/2 ///
				if qtr1!=. & qtr3!=. & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==4
				
	count if filled_inc_brent==.
	
	*If 3 quarters are missing 
	*Have q1, fill q2
	replace 	filled_inc_brent=qtr1*med2_1 ///
				if qtr1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==2
	*Have q1, fill q3
	replace 	filled_inc_brent=qtr1*med3_1 ///
				if qtr1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==3
	*Have q1, fill q4
	replace 	filled_inc_brent=qtr1*med4_1 ///
				if qtr1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==4
	*Have q2, fill q1
	replace 	filled_inc_brent=qtr2*med1_2 ///
				if qtr2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==1
	*Have q2, fill q3
	replace 	filled_inc_brent=qtr2*med3_2 ///
				if qtr2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==3
	*Have q2, fill q4
	replace 	filled_inc_brent=qtr2*med4_2 ///
				if qtr2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==4
	*Have q3, fill q1
	replace 	filled_inc_brent=qtr3*med1_3 ///
				if qtr3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==1
	*Have q3, fill q2
	replace 	filled_inc_brent=qtr3*med2_3 ///
				if qtr3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==2
	*Have q3, fill q4
	replace 	filled_inc_brent=qtr3*med4_3 ///
				if qtr3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==4
	*Have q4, fill q1
	replace 	filled_inc_brent=qtr4*med1_4 ///
				if qtr4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& missing(filled_inc_brent) & quarter==1
	*Have q4, fill q2
	replace 	filled_inc_brent=qtr4*med2_4 ///
				if qtr4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& missing(filled_inc_brent) & quarter==2
	*Have q4, fill q3
	replace 	filled_inc_brent=qtr4*med3_4 ///
				if qtr4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& missing(filled_inc_brent) & quarter==3
				
	count if filled_inc_brent==.
	*9887 values still missing 
}	
********************************STAGE2******************************************
{	
*In the next section, I am taking the median observed value for a building, 
*and then taking the ratio of that to fill in the information for the observation=property 
*pairs that did not have a full year of utility obs in the data. 

drop med*
forvalues i=1/4 {
		bys propnum: egen med`i'=median(qtr`i') if qtr`i'!=0
		}
forvalues i=1/4 {	
	forvalues j=1/4 {
	gen med`i'_`j'=med`i'/med`j' 
	replace med`i'_`j'=. if med`i'<=0 | med`j'<=0
	}
	}
	

*Now take the ratio of the median expenditures for every quarter combinations 
*I am doing this because these are the observations that do not have complete 
*information for a single year 
*Repeat the exact same steps as above-- ***If I try to do the same as above, 
*then we have problems becuase for all of the data, if the year is not complete
*then there are no cases where any of the above analysis holds, so instead, here 
*I switch to taking the median of the values for the property 

	*If 1 quarter is missing: 
	*Missing q1 
	replace 	filled_inc_brent=(med2*med1_2+med3*med1_3+med4*med1_4)/3 ///
				if med2!=. & med3!=. & med4!=. & miss_qt1==1 ///
				& missing(filled_inc_brent) & quarter==1
	*Missing q2
	replace 	filled_inc_brent=(med1*med2_1+med3*med2_3+med4*med2_4)/3 ///
				if med1!=. & med3!=. & med4!=. & miss_qt2==1 ///
				& missing(filled_inc_brent) & quarter==2
	*Missing q3 
	replace 	filled_inc_brent=(med1*med3_1+med2*med3_2+med4*med3_4)/3 ///
				if med1!=. & med2!=. & med4!=. & miss_qt3==1 ///
				& missing(filled_inc_brent) & quarter==3
				
	*Missing q4 
	replace 	filled_inc_brent=(med1*med4_1+med2*med4_2+med3*med4_3)/3 ///
				if med1!=. & med2!=. & med3!=. & miss_qt4==1 ///
				& missing(filled_inc_brent)
				
	count if filled_inc_brent==. 
	
	*If 2 quarters are missing 
	*Missing q1&q2, fill q1
	replace 	filled_inc_brent=(med3*med1_3+med4*med1_4)/2 ///
				if med3!=. & med4!=. & miss_qt1==1 & miss_qt2==1 ///
				& missing(filled_inc_brent) & quarter==1
	*Missing q1&q2, fill q2
	replace 	filled_inc_brent=(med3*med2_3+med4*med2_4)/2 ///
				if med3!=. & med4!=. & miss_qt1==1 & miss_qt2==1 ///
				& missing(filled_inc_brent)	& quarter==2
	*missing q1&q3, fill q1
	replace		filled_inc_brent=(med2*med1_2+med4*med1_4)/2 ///
				if med2!=. & med4!=. & miss_qt1==1 & miss_qt3==1 ///
				& missing(filled_inc_brent) & quarter==1
	*missing q1&q3, fill q3
	replace		filled_inc_brent=(med2*med3_2+med4*med3_4)/2 ///
				if med2!=. & med4!=. & miss_qt1==1 & miss_qt3==1 ///
				& missing(filled_inc_brent) & quarter==3
	*missing q1&q4, fill q1
	replace 	filled_inc_brent=(med2*med1_2+med3*med1_3)/2 ///
				if med2!=. & med3!=. & miss_qt1==1 & miss_qt4==1 ///
				& missing(filled_inc_brent)	& quarter==1
	*missing q1&q4, fill q4
	replace 	filled_inc_brent=(med2*med4_2+med3*med4_3)/2 ///
				if med2!=. & med3!=. & miss_qt1==1 & miss_qt4==1 ///
				& missing(filled_inc_brent)	& quarter==4
	*missing q2&q3, fill q2
	replace 	filled_inc_brent=(med1*med2_1+med4*med2_4)/2 ///
				if med1!=. & med4!=. & miss_qt2==1 & miss_qt3==1 ///
				& missing(filled_inc_brent)	& quarter==2
	*missing q2&q3, fill q3
	replace 	filled_inc_brent=(med1*med3_1+med4*med3_4)/2 ///
				if med1!=. & med4!=. & miss_qt2==1 & miss_qt3==1 ///
				& missing(filled_inc_brent)	& quarter==3
	*missing q3&q4, fill q3
	replace 	filled_inc_brent=(med1*med3_1+med2*med3_2)/2 ///
				if med1!=. & med2!=. & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==3	
	*missing q3&q4, fill q4
	replace 	filled_inc_brent=(med1*med4_1+med2*med4_2)/2 ///
				if med1!=. & med2!=. & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent)	& quarter==4
	*missing q2&q4, fill q2
	replace		filled_inc_brent=(med1*med2_1+med3*med2_3)/2 ///
				if med1!=. & med3!=. & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==2
	*missing q2&q4, fill q4
	replace		filled_inc_brent=(med1*med4_1+med3*med4_3)/2 ///
				if med1!=. & med3!=. & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==4
				
	count if filled_inc_brent==.
	
	*If 3 quarters are missing 
	*Have q1, fill q2
	replace 	filled_inc_brent=med1*med2_1 ///
				if med1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==2
	*Have q1, fill q3
	replace 	filled_inc_brent=med1*med3_1 ///
				if med1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==3
	*Have q1, fill q4
	replace 	filled_inc_brent=med1*med4_1 ///
				if med1!=. & miss_qt2==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==4
	*Have q2, fill q1
	replace 	filled_inc_brent=med2*med1_2 ///
				if med2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==1
	*Have q2, fill q3
	replace 	filled_inc_brent=med2*med3_2 ///
				if med2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==3
	*Have q2, fill q4
	replace 	filled_inc_brent=med2*med4_2 ///
				if med2!=. & miss_qt1==1 & miss_qt3==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==4
	*Have q3, fill q1
	replace 	filled_inc_brent=med3*med1_3 ///
				if med3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==1
	*Have q3, fill q2
	replace 	filled_inc_brent=med3*med2_3 ///
				if med3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==2
	*Have q3, fill q4
	replace 	filled_inc_brent=med3*med4_3 ///
				if med3!=. & miss_qt1==1 & miss_qt2==1 & miss_qt4==1 ///
				& missing(filled_inc_brent) & quarter==4
	*Have q4, fill q1
	replace 	filled_inc_brent=med4*med1_4 ///
				if med4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& missing(filled_inc_brent) & quarter==1
	*Have q4, fill q2
	replace 	filled_inc_brent=med4*med2_4 ///
				if med4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& missing(filled_inc_brent) & quarter==2
	*Have q4, fill q3
	replace 	filled_inc_brent=med4*med3_4 ///
				if med4!=. & miss_qt1==1 & miss_qt3==1 & miss_qt3==1 ///
				& missing(filled_inc_brent) & quarter==3
				
	count if filled_inc_brent==.
	*4127 observations missing 
}

*-----------------------------------------*
*	Step 4.6 Summary Stats for rent 	  *
*-----------------------------------------*	

*SUMMARY STATS 

*Summary Statistics: 
	bysort propnum year quarter: egen not_missing_obs=max(filled_inc_brent)
	drop yr_exp_util
	bys propnum year: egen yr_inc_brent=total(filled_inc_brent) if !missing(not_missing_obs)	
*make the var of interest- yearly real rent per square foot 
	bysort propnum year: egen yr_inc_brent_max=max(yr_inc_brent)
	bysort propnum year: egen yr_inc_brent_mode=mode(yr_inc_brent)
	br if yr_inc_brent_mode!=yr_inc_brent_max
	by propnum year: egen sqft_max=max(sqft)
	by propnum year: egen sqft_mode=mode(sqft), max
	br if sqft_mode!=sqft_max
	by propnum year: egen sqft_min=min(sqft)
	br if sqft_mode!=sqft_min
	br if sqft_min!=sqft_max
	*If the observations don't have the same square footage in a year, take the mode
	*Check with Margaret and Karen about this 
	replace sqft=sqft_mode 
	*This replaces the information for 0 observations 
	gen rl_yr_rentpersf=yr_inc_brent/sqft
	*Check to make sure real util per sf is the same for all obs in a year 
	bysort propnum year: egen rentpersf_mode=mode(rl_yr_rentpersf), max
	bysort propnum year: egen rentpersf_max=max(rl_yr_rentpersf)
	count if rentpersf_max!=rentpersf_mode //0 Observations 
	
preserve
count if rl_yr_rentpersf!=.	
gen ind=1 
by propnum year: egen numqtrsfilled=total(ind) if rl_yr_rentpersf!=.
keep if numqtrsfilled==4
count if rl_yr_rentpersf!=.


collapse (min) rating last_year_cert years_since_cert ind_cert_last5 ///
	ind_cert_last10 rl_yr_rentpersf, by(propnum year)

	*Energy Star Certification Summary Stats: 

	*Now, checking out the number of estar buildings we have in the data 
	*No of obs with estar rating that year
	count if rating!=.
	*No of Obs. that have ever been rated
	bysort propnum: egen everrated=min(rating)
	count if everrated!=.
	*No of obs. that have been rated in the last 5 or 10 years 
	count if ind_cert_last5==1
	count if ind_cert_last10==1
	*No ob obs that have never been estar rated 
	count if everrated==.
	*No of unique propnum in the data set- ie the number of unique buildings in the data set 
	bysort propnum: gen uniquebuilding=1 if _n==1
	count if uniquebuilding==1
	*Find the number of builings that have ever/never been certified 
	gen buildingevercert=1 if uniquebuilding==1 & everrated!=.
	count if buildingevercert==1
	gen buildingnevercert=1 if uniquebuilding==1 & everrated==.
	count if buildingnevercert==1

*Real Utility per square foot Information 
	su rl_yr_rentpersf, detail
	su rl_yr_rentpersf if rating!=., detail
	su rl_yr_rentpersf if rating==., detail
	su rl_yr_rentpersf if ind_cert_last5==1, detail
	su rl_yr_rentpersf if ind_cert_last5==., detail
	su rl_yr_rentpersf if ind_cert_last10==1, detail
	su rl_yr_rentpersf if ind_cert_last10==., detail
	
	by propnum: egen firstyearrated=min(last_year_cert)
	*trying to find the mean of the property before and after 
	*certification 
	count if firstyearrated!=.
	su rl_yr_rentpersf if !missing(firstyearrated) & year<firstyearrated, detail
	su rl_yr_rentpersf if !missing(firstyearrated) & year>=firstyearrated, detail

*Graphs-- 
	*kdensities of before and after cert on the same graph 
	twoway 	(kdensity rl_yr_rentpersf if !missing(firstyearrated) & year<firstyearrated, biweight width(5) lwidth(medthick) lcolor(navy)) ///
			(kdensity rl_yr_rentpersf if !missing(firstyearrated) & year>=firstyearrated, biweight width(5) lwidth(medthick) lcolor(cyan) ///
			bgcolor(white) graphregion(color(white)) xtitle(Real Utility Expenditure Per Sqft.) ytitle(Kernel Density) ///
			legend(lab(1 "Before Certification") lab(2 "After Certification")) title("Second Stage Fill"))
	graph export "P:\Efficiency Financing\NCREIF\Summer_2017\Notes&OtherInfo\Summary Graphs\RentSecondStageFill.png", replace	

restore
drop rentpersf_max rentpersf_mode sqft_min sqft_mode sqft_max yr_inc_brent_mode ///
yr_inc_brent_max not_missing_obs med* qt4_4 qt4_3 qt4_2 qt4_1 qt3_4 qt3_3 qt3_2 ///
qt3_1 qt2_4 qt2_3 qt2_2 qt2_1 qt1_4 qt1_3 qt1_2 qt1_1 miss_qt4 miss_qt3 miss_qt2 ///
miss_qt1 one_yr_complete year_complete qtr4 qtr3 qtr2 qtr1 qt4 qt3 qt2 qt1 ///
yr_exp_util_mode yr_exp_util_max qt1_exist qt2_exist qt3_exist qt4_exist uniquebuilding

save "$Fake\FakeNCREIFfiletoannualize", replace
	}
*------------------------------------------------------------------------------*
* 				Step 5- Make the Data Annual and Clean 						   *
*------------------------------------------------------------------------------*
*-----------------------------------------*
*		Step 5.1 Make Data Annual	 	  *
*-----------------------------------------*	
set more off 
use "$Fake\FakeNCREIFfiletoannualize", clear

*Drop Variables that will not be used in our analysis 
	drop missingtemp count appret incret totret mv noi capex psales denom ///
	capex_addacqcost capex_bldexpan capex_bldimp capex_leasecomm capex_other ///
	capex_ti capex_tot_calc 
*See what data is missing in the variables that we will keep in the data set 
	count if HDD==. //0
	count if CDD==. //0
	count if elecprice==. // 596 (All in Texas) 
	count if missing(propertytype)
	count if missing(acqdate)
	count if missing(Unemployment)
	
*Fixing Missing Data 
	*168 Observations for Unemployment are missing, use the file from NCREIF_estar by bb to 
	*merge in this missing information 
	
	use "$GenData\NCREIF_wmissingquartersinfo.dta", clear
	keep if cbsa== 18180 | cbsa== 26740 | cbsa==27940 | cbsa== 30220
	keep cbsa year quarter yyyyq Unemployment
	collapse (mean) Unemployment, by(cbsa yyyyq)
	rename Unemployment Unemp_merge
	
	*Leaves me with 168 observations 
	tempfile extraUnemp 
	save "`extraUnemp'"
	
	use "$Fake\FakeNCREIFfiletoannualize", replace
	merge m:1 cbsa yyyyq using "`extraUnemp'"
	drop _merge 
	replace Unemployment=Unemp_merge if Unemployment==.
	drop Unemp_merge
	save "$Fake\FakeNCREIFfiletoannualize", replace
	count if missing(Unemployment) //Now only 4 values Missing 
	


*-----------------------------------------*
*		Step 5.2 Clean Data		 	 	  *
*-----------------------------------------*	

	count if HDD==. //0
	count if CDD==. //0
	count if real_elecprice==. // 596(All in Texas) 
	count if missing(propertytype)
	count if missing(acqdate) // 3412 missing
	count if missing(Unemployment) // 4 missing
	count if real_gasprice==. //112 missing real gas price 
	
	/*1.yrbuilt*/
	bysort propnum: egen yrbuilt_max = max(yrbuilt)
	bysort propnum: egen yrbuilt_mode = mode(yrbuilt),maxmode
	count if yrbuilt_mode ==0 & yrbuilt_max>0 /*0 buildings have different year built*/

	replace yrbuilt_mode = yrbuilt_max if yrbuilt_max>0 & yrbuilt_mode ==0
	rename yrbuilt orig_yrbuilt
	rename yrbuilt_mode yrbuilt
	count if yrbuilt==. /*4928*/
	count if yrbuilt ==0  /*28*/
	drop yrbuilt_max
	
	/*2. sqft*/
	bysort propnum year: egen sqft_max=max(sqft)
	by propnum year: egen sqft_mode=mode(sqft), max
	br if sqft_mode!=sqft_max
	by propnum year: egen sqft_min=min(sqft)
	br if sqft_mode!=sqft_min
	br if sqft_min!=sqft_max
	*If the observations don't have the same square footage in a year, take the mode
	*Check with Margaret and Karen about this 
	replace sqft=sqft_mode 
	count if sqft==0 | sqft==0 //1824
	drop if sqft==0 //1824
	drop sqft_max sqft_mode sqft_min
	sum sqft, detail
	eqprhistogram sqft, bin(10) ///
	plot(kdensity sqft, biweight width(5) lcolor(navy) ///
	lwidth(medthick)) title("SQFT") bgcolor(white) graphregion(color(white))
	
	**Look at the histogram tomorrow to determine if these cutoffs make sense
	count if sqft <6500
	count if sqft > 1500000
	drop if sqft <6500
	drop if sqft > 1500000
	eqprhistogram sqft, bin(25) ///
	plot(kdensity sqft, biweight width(5) lcolor(navy) ///
	lwidth(medthick)) title("SQFT") bgcolor(white) graphregion(color(white))
	
	
	/*4.percentleased*/
	sum percentleased
	count if percentleased == . 
	count if percentleased >1 & percentleased!=.
	replace percentleased = 1 if percentleased >1 & percentleased !=.
	drop if percentleased == . // 9356
	drop if percentleased>1 //0

	/*5.nooffloors*/
	sort propnum nooffloors
	bysort propnum: carryforward nooffloors, gen(filled_nooffloors) 
	count if nooffloors ==. //136
	drop if nooffloors ==.

	/*6. electricity price*/
	drop if elecprice==. //544
	
	
	
*Make the data set with the Dependent var- Utility expenditure per square foot 	
preserve
count if rl_yr_rentpersf!=.	
gen ind=1 
bysort propnum year: egen numqtrsfilled=total(ind) if rl_yr_rentpersf!=.
keep if numqtrsfilled==4

*Leaves us with 87212observations where all information is available 
count if rl_yr_utilpersf!=.


*Make the Utility Data Annual 
collapse (mean) elecprice percentleased gasprice real_elecprice real_gasprice Unemployment ///
(sum) HDD CDD real_inc_total real_exp_total real_exp_util real_utilpersf real_inc_brent ///
(min) zip cbsa cbsadiv cbsaordiv acqdate acquisitionyear acquisitiondate ///
initialcost initacqcost saleqtr yearsold saleprice grosssaleprice netsalepriceweath ///
mgrid nooffloors noofbuildings noofunits nra sqft yrbuilt lastrenovatedyear ///
yrbuiltorlastren Covered_E fake_data rating ind_yr last_year_cert years_since_cert ///
real_initialcost real_initacqcost ///
ind_cert_last5 ind_cert_last10 everrated rl_yr_utilpersf rl_yr_rentpersf ///
(firstnm) prop cbsaname propertytype propertysubtype address1 state city county ///
censusdivfips mgrname, by(propnum year)

	/*6. Outliers for variables of interest real utility */
	su rl_yr_utilpersf, detail
	drop if rl_yr_utilpersf<.03  //214
	drop if rl_yr_utilpersf>12.51 //848
	eqprhistogram rl_yr_utilpersf, bin(25) ///
	plot(kdensity rl_yr_utilpersf, biweight width(5) lcolor(navy) ///
	lwidth(medthick)) title("Util Per SQFT") bgcolor(white) graphregion(color(white))
	

save "$Fake\FakeNCREIFAnnualizedUtil", replace
restore
**Rent Data Annual 

preserve 
count if rl_yr_rentpersf!=.
gen ind=1 
bysort propnum year: egen numqtrsfilled=total(ind) if rl_yr_utilpersf!=.
keep if numqtrsfilled==4

count if rl_yr_rentpersf!=. //88792

*Make the Rent Data Annual 
collapse (mean) elecprice percentleased gasprice real_elecprice real_gasprice Unemployment ///
(sum) HDD CDD real_inc_total real_exp_total real_exp_util real_utilpersf real_inc_brent ///
(min) zip cbsa cbsadiv cbsaordiv acqdate acquisitionyear acquisitiondate ///
initialcost initacqcost saleqtr yearsold saleprice grosssaleprice netsalepriceweath ///
mgrid nooffloors noofbuildings noofunits nra sqft yrbuilt lastrenovatedyear ///
yrbuiltorlastren Covered_E fake_data rating ind_yr last_year_cert years_since_cert ///
real_initialcost real_initacqcost ///
ind_cert_last5 ind_cert_last10 everrated rl_yr_utilpersf rl_yr_rentpersf ///
(firstnm) prop cbsaname propertytype propertysubtype address1 state city county ///
censusdivfips mgrname, by(propnum year)

	*Outliers for rent 
	su rl_yr_rentpersf, detail
	eqprhistogram rl_yr_rentpersf, bin(25) ///
	plot(kdensity rl_yr_rentpersf, biweight width(5) lcolor(navy) ///
	lwidth(medthick)) title("Rent Per SQFT") bgcolor(white) graphregion(color(white))
	drop if rl_yr_rentpersf<1.00  // 128  dropped
	drop if rl_yr_rentpersf>100 // 99
	
save "$Fake\FakeNCREIFAnnualizedRent", replace
restore

*-----------------------------------------*
*Step 5.2 Summary Stat for Utility Data   *
*-----------------------------------------*	
use "$Fake\FakeNCREIFAnnualizedUtil", clear

	count if rating!=.
	*No of Obs. that have ever been rated
	drop everrated
	bysort propnum: egen everrated=min(rating)
	count if everrated!=.
	*No of obs. that have been rated in the last 5 or 10 years 
	
	*Now, I generate a value for the years since a building was rated 
	drop last_year_cert years_since_cert ind_cert_last5 ind_cert_last10
	bysort prop: carryforward ind_yr, gen(last_year_cert) 

	gen years_since_cert= year-last_year_cert

	*Value of certification has been suggested to decrease over time so, let's gen an indicator for years since cert 
	*First, 5 years or less since certification 
	gen ind_cert_last5=1 if !missing(years_since_cert) & years_since_cert<=5

	*10 years or less since certification 
	gen ind_cert_last10=1 if !missing(years_since_cert) & years_since_cert<=10
	count if ind_cert_last5==1
	count if ind_cert_last10==1
	*No ob obs that have never been estar rated 
	count if everrated==.
	*No of unique propnum in the data set- ie the number of unique buildings in the data set 
	bysort propnum: gen uniquebuilding=1 if _n==1
	count if uniquebuilding==1
	*Find the number of builings that have ever/never been certified 
	gen buildingevercert=1 if uniquebuilding==1 & everrated!=.
	count if buildingevercert==1
	gen buildingnevercert=1 if uniquebuilding==1 & everrated==.
	count if buildingnevercert==1

*Real Utility per square foot Information 
	su rl_yr_utilpersf, detail
	su rl_yr_utilpersf if rating!=., detail
	su rl_yr_utilpersf if rating==., detail
	su rl_yr_utilpersf if ind_cert_last5==1, detail
	su rl_yr_utilpersf if ind_cert_last5==., detail
	su rl_yr_utilpersf if ind_cert_last10==1, detail
	su rl_yr_utilpersf if ind_cert_last10==., detail
	
	by propnum: egen firstyearrated=min(last_year_cert)
	*trying to find the mean of the property before and after 
	*certification 
	count if firstyearrated!=.
	su rl_yr_utilpersf if !missing(firstyearrated) & year<firstyearrated, detail
	su rl_yr_utilpersf if !missing(firstyearrated) & year>=firstyearrated, detail


*-----------------------------------------*
*Step 5.2 Summary Stat for Rent Data 	  *
*-----------------------------------------*	
use "$Fake\FakeNCREIFAnnualizedRent", clear
	
	count if rating!=.
	*No of Obs. that have ever been rated
	drop everrated
	bysort propnum: egen everrated=min(rating)
	count if everrated!=.
	*No of obs. that have been rated in the last 5 or 10 years 
	
	*Now, I generate a value for the years since a building was rated 
	drop last_year_cert years_since_cert ind_cert_last5 ind_cert_last10
	bysort prop: carryforward ind_yr, gen(last_year_cert) 

	gen years_since_cert= year-last_year_cert

	*Value of certification has been suggested to decrease over time so, let's gen an indicator for years since cert 
	*First, 5 years or less since certification 
	gen ind_cert_last5=1 if !missing(years_since_cert) & years_since_cert<=5

	*10 years or less since certification 
	gen ind_cert_last10=1 if !missing(years_since_cert) & years_since_cert<=10
	count if ind_cert_last5==1
	count if ind_cert_last10==1
	*No ob obs that have never been estar rated 
	count if everrated==.
	*No of unique propnum in the data set- ie the number of unique buildings in the data set 
	bysort propnum: gen uniquebuilding=1 if _n==1
	count if uniquebuilding==1
	*Find the number of builings that have ever/never been certified 
	gen buildingevercert=1 if uniquebuilding==1 & everrated!=.
	count if buildingevercert==1
	gen buildingnevercert=1 if uniquebuilding==1 & everrated==.
	count if buildingnevercert==1

*Real Utility per square foot Information 
	su rl_yr_rentpersf, detail
	su rl_yr_rentpersf if rating!=., detail
	su rl_yr_rentpersf if rating==., detail
	su rl_yr_rentpersf if ind_cert_last5==1, detail
	su rl_yr_rentpersf if ind_cert_last5==., detail
	su rl_yr_rentpersf if ind_cert_last10==1, detail
	su rl_yr_rentpersf if ind_cert_last10==., detail
	
	by propnum: egen firstyearrated=min(last_year_cert)
	*trying to find the mean of the property before and after 
	*certification 
	count if firstyearrated!=.
	su rl_yr_rentpersf if !missing(firstyearrated) & year<firstyearrated, detail
	su rl_yr_rentpersf if !missing(firstyearrated) & year>=firstyearrated, detail

	
